#LLM Evaluation
2 articles
ChatGPT Paper Review - Connecting Context Design to Safe Behavior
We selected three recently released papers and explain, across them, (1) the systematicization of context engineering, (2) contamination/integrity problems in evaluation, and (3) a modularized perc...
ChatGPT Paper Review - Instruction Following, Safety Alignment, and Agentic RAG
Explains new papers on instruction-following evaluation (FireBench), theoretical clarity on RLHF alignment, internal representation stability, and a SoK for agentic RAG.